An Efficient Algorithm for Identifying the Most Contributory Substring

نویسنده

  • Ben Stephenson
چکیده

Detecting repeated portions of strings has important applications to many areas of study including data compression and computational biology. This paper defines and presents a solution for the Most Contributory Substring Problem, which identifies the single substring that represents the largest proportion of the characters within a set of strings. We show that a solution to the problem can be achieved with an O(n) running time (where n is the total number of characters in all of the input strings) when overlapping occurrences of the most contributory substring are permitted. Furthermore, we present an extended algorithm that does not permit occurrences of the most contributory substring to overlap. The expected running time of the extended algorithm is O(n logn) while its worst case performance is O(n).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Rhythms in Musical Texts

A fundamental problem in music is to classify songs according to their rhythm. A rhythm is represented by a sequence of “Quick” (Q) and “Slow” (S) symbols, which correspond to the (relative) duration of notes, such that S = 2Q. In this paper, we present an efficient algorithm for locating the maximum-length substring of a music text t that can be covered by a given rhythm r.

متن کامل

Identifying and Evaluating Effective Factors in Green Supplier Selection using Association Rules Analysis

Nowadays companies measure suppliers on the basis of a variety of factors and criteria that affect the supplier's selection issue. This paper intended to identify the key effective criteria for selection of green suppliers through an efficient algorithm callediterative process mining or i-PM. Green data were collected first by reviewing the previous studies to identify various environmental cri...

متن کامل

The Efficient Computation of Complete and Concise Substring Scales with Suffix Trees

Strings are an important part of most real application multivalued contexts. Their conceptual treatment requires the definition of substring scales, i.e., sets of relevant substrings, so as to form informative concepts. However these scales are either defined by hand, or derived in a context-unaware manner (e.g., all words occuring in string values). We present an efficient algorithm based on s...

متن کامل

Generalized Substring Compression

In substring compression one is given a text to preprocess so that, upon request, a compressed substring is returned. Generalized substring compression is the same with the following twist. The queries contain an additional context substring (or a collection of context substrings) and the answers are the substring in compressed format, where the context substring is used to make the compression...

متن کامل

An Energy-efficient Mathematical Model for the Resource-constrained Project Scheduling Problem: An Evolutionary Algorithm

In this paper, we propose an energy-efficient mathematical model for the resource-constrained project scheduling problem to optimize makespan and consumption of energy, simultaneously. In the proposed model, resources are speed-scaling machines. The problem is NP-hard in the strong sense. Therefore, a multi-objective fruit fly optimization algorithm (MOFOA) is developed. The MOFOA uses the VIKO...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007